378 research outputs found
A COMPUTATIONALLY EFFICIENT METHOD FOR DETERMINING SIGNIFICANCE IN INTERVAL MAPPING OF QUANTITATIVE TRAIT LOCI
This paper provides a brief introduction to the mapping of quantitative trait loci (QTL). An example on mapping QTL for root thickness in rice is presented to illustrate popular statistical methods used in QTL mapping. Interval mapping is used in conjunction with permutation testing techniques to detect significant associations between genetic positions and quantitative traits while controlling overall type I error rate. A review of a recent technique that can greatly reduce the computational expense of permutation testing in QTL mapping is discussed. Theory is provided for an extension of recent results that may lead to more powerful methods of QTL mapping through permutation testing
Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis
Two-dimensional gel electrophoresis is a biochemical technique that combines isoelectric focusing and SDS-polyacrylamide gel technology to achieve simultaneous separation of protein mixtures on the basis of isoelectric point and molecular weight. Upon staining, each protein on a gel can be characterized by an intensity measurement that reflects its abundance in the mixture. These can then conceptually be used to determine which proteins are differentially expressed under different experimental conditions. We propose an EM approach to identify differentially expressed proteins using an inferential strategy that accounts for uncertainty in matching spots to proteins across gels. The underlying mixture model has trivariate Gaussian components. The application of the EM is however, not straightforward, with the main difficulty lying in the E-step calculations because of the dependent structure of proteins within each gel. Therefore, the usual model-based clustering approach is inapplicable, and an MCMC approach is employed. Through data-based simulation, we demonstrate that our proposed method effectively accounts for uncertainty in spot matching and more successfully distinguishes differentially and non-differentially expressed proteins than a naïve t-test which ignores uncertainty in spot matching
Stability of Random Forests and Coverage of Random-Forest Prediction Intervals
We establish stability of random forests under the mild condition that the
squared response () does not have a heavy tail. In particular, our
analysis holds for the practical version of random forests that is implemented
in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results
show that stability may persist even beyond our assumption and hold for
heavy-tailed . Using the stability property, we prove a non-asymptotic
lower bound for the coverage probability of prediction intervals constructed
from the out-of-bag error of random forests. With another mild condition that
is typically satisfied when is continuous, we also establish a
complementary upper bound, which can be similarly established for the jackknife
prediction interval constructed from an arbitrary stable algorithm. We also
discuss the asymptotic coverage probability under assumptions weaker than those
considered in previous literature. Our work implies that random forests, with
its stability property, is an effective machine learning method that can
provide not only satisfactory point prediction but also justified interval
prediction at almost no extra computational cost.Comment: NeurIPS 202
- …